Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval?
نویسندگان
چکیده
In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most of the forms are rare. Corpus statistics showed that about 84 – 88 per cent of the occurrences of inflected noun forms are forms of only six cases out of the 14 possible. This number – maximally 2*6 – of keyword’s variant forms makes it feasible to try them all in a search. IR results of the frequent keyword form variation coverage were tested with three to twelve keyword variant forms in two test collections, TUTK and CLEF 2003’s Finnish material. The results show that the frequent keyword form generation method competes well with the gold standard, lemmatization, with nine and twelve variant keyword forms.
منابع مشابه
GéDériF: Automatic Generation and Analysis of Morphologically Constructed Lexical Resources
One of the major frequent problems in text retrieval comes from large number of words encountered which are not listed in general language dictionaries. However, it is very often the case that these words are morphologically complex, and as such have a meaning which is predictable on the basis of their structure. Furthermore, such words typically belong to specialized language uses (e.g. scient...
متن کاملارائه روشی جدید برای شاخصگذاری خودکار و استخراج کلمات کلیدی برای بازیابی اطلاعات و خوشهبندی متون
Persian words in writing with a diverse and cover all modes of grammatical words with the recruitment of a series of specific rules because it is impossible to extract keywords automatically from Persian texts difficult and complex. This thesis has attempted to use linguistic information and thesaurus, keywords Mnatry be provided. Using the symbol system is structured network can be keywords, i...
متن کاملThe Effect of Raising Morphological Decomposition Awareness on Lexical Knowledge of Complex English Words
Lexical knowledge of complex English words is an important part of language skills and crucial for fluent language use. This study aimed to assess the role of morphological decomposition awareness as a vocabulary learning strategy on learners’ productive and receptive recall and recognition of complex English words. University students majoring English at the...
متن کاملAn Extensible Query Language for Content Based Image Retrieval
One of the most important bits of every search engine is the query interface. Complex interfaces may cause users to struggle in learning the handling. An example is the query language SQL. It is really powerful, but usually remains hidden to the common user. On the other hand the usage of current languages for Internet search engines is very simple and straightforward. Even beginners are able t...
متن کاملUsing Syllables As Indexing Terms in Full-Text Information Retrieval
This paper describes empirical results of information retrieval in 13 languages of the Cross Language Evaluation Forum (CLEF) collection augmented with results of Turkish using syllables as a means to manage morphological variation in the languages. This kind of approach has been used in speech retrieval (e.g. Larson and Eickeler 2003), but for some reason it has not been much tried out in text...
متن کامل